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Abstract 



In this paper, we investigate a formula to solve systems of the form (B + 
al)x = y, where B is a limited-memory BFGS quasi-Newton matrix and 
a is a positive constant. These types of systems arise naturally in large- 
scale optimization such as trust-region methods as well as doubly-augmented 
^ '. Lagrangian methods. We show that provided a simple condition holds on Bq 

and a, the system (B + al)x = y can be solved via a recursion formula that 
requies only vector inner products. This formula has complexity M 2 n, where 
O " M is the number of L-BFGS updates and n>Mis the dimension of x. 

. Keywords: L-BFGS, quasi-Newton methods, limited-memory methods, 

inverses, Sherman-Morrison- Woodbury, diagonal updates 



1. Introduction 

>< 

$_i ; In this paper we develop a recursion formula for solving systems of the 

form (Bk + al)x = y, where Bk is the fc-th step n x n limited-memory 
(L-BFGS) quasi-Newton Hessian (see e.g., 0, Oil, 21, [25[), a is a positive 



constant and x, y G dt n . Linear systems of this form appear in both large-scale 
unconstrained and constrained optimization. For example, these equations 
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arise in the optimality conditions for the two-norm trust-region subproblem 



and the so-called doubly- augmented system and their applications [9l. llOl. Il2 



Additionally, matrices of the form Bk + o~I can be used as preconditioners for 
H + al, where H is often the Hessian of a twice-continuously differentiable 
function /. 

The algorithm proposed in this paper is an extension of the Sherman- 
Morrison- Woodbury (SMW) formula to compute the inverse of Bk + al. The 
algorithm begins by pairing the initial L-BFGS matrix Bq with the initial 
diagonal update al; then, the algorithm uses the SMW formula to compute 
the inverse of Bk + al after updating with a sequence of the L-BFGS updates. 
In this paper, we derive a recursion formula for efficiently computing matrix- 
vector products with this inverse. The proposed algorithm requires 0(M 2 n) 
multiplications, where M is the maximum number of L-BFGS updates. 

The structure of the remainder of the paper is as follows. In £J21 we 
motivate in detail why solving systems of the form (Bk + crl)x = y is crucial 
in several optimization settings. Specifically, we consider its use in solving 
the trust-region subproblem and in the preconditioning of doubly-augmented 
system in barrier methods. In §31 we provide an overview of the L-BFGS 
quasi- Newton matrix, including operation counts of the well-known recursive 
formulas for operations with the quasi-Newton matrix. In §H we consider 
the formula for solving systems of the form (Bk + al)x = y and show how it 
compares computationally with direct and indirect methods in §5J We offer 
potential extensions of this formula in £0 and draw some conclusions in 

In this paper, we assume that updates are chosen that ensure all L-BFGS 
quasi-Newton matrices are positive definite. 



2. Motivations from large-scale optimization 

Systems of the form (B + al)x = y, where B is a quasi-Newton Hessian 
appear throughout large-scale, nonlinear optimization. In this section, we 
motivate our research by presenting two instances in optimization that would 
benefit from a recursion formula to directly solve systems with a system 
matrix of the form B + al. The first motivation is in trust-region methods 
for unconstrained optimization; the second motivation comes from barrier 
methods for constrained optimization. 

Motivation 1: Trust-region methods. Trust region methods are one of 
the most popular types of methods for unconstrained optimization. Trust 
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region methods have been extended into the quasi-Newton setting by using 
BFGS or L-BFGS updates (see, for example, [ll, I, B H E 0, 0, E3, S, H ) • 
The bulk of the work for a trust-region method occurs when solving the trust- 
region subproblem. Given the current trust-region iterate x, the two-norm 
trust-region subproblem for minimizing a function / is given by 



minimize q(s) 



q 1 s + -s 1 Bs 
y 2 



subject to || s || 2 < 5, 



(1) 



where g = V/(x), B is an L-BFGS quasi-Newton approximation to the Hes- 
sian of / at x, and 5 is a given positive trust-region radius. 

Optimality conditions for the L-BFGS quasi-Newton trust -reg ion sub- 
problem are summarized in the following result (adapted from [ilfli. 23, 29|). 



Theorem 1. Let 5 be a given positive constant. A vector s* is a global 
solution of the trust-region problem ([!]) if and only if \\s*\\2 < 5 and there 
exists a unique a* > such that 



(B + a*I)s* = —g and a*(5 — ||s*||2) 
Moreover, the global minimizer is unique. 



0. 



(2) 



The More-Sorensen method 22J is the preferred direct solver for the gen- 
eral trust-region subproblem. The method seeks a solution (x*,a*) that 
satisfies the optimality conditions for the trust-region subproblem (in this 
case (T5])) to any prescribed accuracy. The algorithm below summarizes the 
More-Sorensen method 22| for the general trust-region subproblem: 



Algorithm 2.1: More-Sorensen method. 

Let a > with B + a I positive definite and 5 > 
while not converged do 

Factor B + al = R T R; 

Solve R T Rp = —g; 

Solve R T q = p; 

*w + (g) 2 (^) ; 

end 
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Notice that the More-Sorensen method solves systems of the form (B + 
al)s = —g at each iteration by computing the Cholesky factorization of 
B + al. For general large-scale optimization, where B is not a quasi-Newton 
Hessian, this method is often prohibitively expensive if one cannot exploit 
structure in the system matrix B. However, in the quasi-Newton setting, it 
is possible to compute the Choleksy factorization of a BFGS matrix of 



by updating the factorization for B^ (see, for example, |16I]). 

In this paper, we develop a method for solving the system (B + aI)s = —g 
using a recursion relation in place of using Cholesky factorizations. It is then 
possible to continue on with the More-Sorensen method by updating a in 
accordance with the ideas proposed in [22] . (Note: The source of the formula 
for updating o in Algorithm 12.11 is based on applying Newtons method to 
the second optimality condition in (j2J).) The recursion formula proposed in 
this paper extends the More-Sorensen method into the quasi-Newton setting 
without having to update Cholesky factorizations. 

Motivation 2: Barrier methods. The second motivating example comes 



from barrior methods for constrained optimization (see, e.g., Ill, |3(|). Con- 
sider the following problem: 

minimize f(x) subject to c(x) > 0, (3) 

where f(x) : 5R n — > 9ft is a real- valued function and c(x) : 3ft n — > 3? is a 
quadratic constraint of the form c(x) = |<5 2 — \x T x, i.e., a two-norm con- 
straint on the size of x where 5 is a positive constant. (Note this can be 
considered a generalization of the trust-region subproblem.) A sufficient 
condition for a point x* to be the minimizer of ([3]) is the existence of the 
Lagrange multiplier A* satisfying the following: 

Vc(x*)A* = g(x*), with H(x*) + X*I positive definite, , , 

c(x*)\* = 0, with A* > if c{x*) = and A*,c(x*) > 0, U 

where g(x) = V/(x) is the gradient and H{x) = V 2 f(x) is the Hessian of 
f(x). Primal-dual methods p, [l3|, |26j solve this type of problem by solv- 



ing a sequence of perturbed problems. Specifically, we define the function 



: 9fJ n+1 -> 9fJ n+1 with 



F»{x,\) 



g(x) + \x 
c(x)X — fi 
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for some fixed perturbation parameter /i > 0. Note that Vc(x) = — x and as 
\i — > 0, the root of F^(x, A) converges to a point that satisfies the equations 
in (gD. 

The Newton equations associated with finding a root of F^x, A) are given 

by 

'H{x) + XI x \ f Ax\ _ ( g(x) + Ax 
—\x T c(x) J \AX J ~~ \c(x)X - (i 

By dividing the second row by —A and letting d = c(x)/X, we get the sym- 
metric system 



(5) 



H(x) + XI x \ f Ax\ f g(x) + Xx\ 
x T -d)\AX)~~\{ l i-d)/X)' 



(6) 



Unfortunately, the system matrix in ()6]) is indefinite even when H + XI is 
positive definite; however, after rearranging terms, it can be shown that this 
system is equivalent to the doubly- augmented system 

(7) 



H(x) + XI + 2xx T -x\ ( Ax\ _ ( g(x) + Xx + (2/d)(/x - d)x 



-x 



T 



d J \AXJ ' \ (d-jj)/X 



which is a positive-definite system when H + XI is positive definite [12 
The linear system (J7j) can be preconditioned by the matrix 

B + XI + 2xx T -x 
— x T d 



(8) 



where B is an L-BFGS quasi-Newton approximation to V 2 f(x). Solves with 
the preconditioner P for a general system of the form 

B + XI + 2xx T -x\/ , x 1 \ fyi 
-x T d ) \x 2 ) \V2 

can be performed by solving the following equivalent system: 

(see 0, [l2{]). Note that the inverse of the system matrix in ([9]) is given by 
B + XI xY 1 (I -w\ ({B + XI)- 1 \ f I 



x 



-l) = \0 1J\ -(l + w T xy l )\-w T I 



where w = (B + XI^x (for details, see [l2[). Thus, provided that solves 
with B + XI are efficient, solves with P are efficient. 

In this paper, we develop a recursion formula that makes solves with 
(B + \I)x = y. Thus, this recursion formula allows the use of preconditioners 
the form ([8]) where B is a L-BFGS quasi-Newton approximation of V 2 /. 



3. The Limited-memory BFGS method 

In this section, we review the limited-memory BFGS (L-BFGS) method 
and state important and well-known recursion formulas for computing prod- 
ucts with an L-BFGS quasi-Newton Hessian and its inverse. 

The BFGS quasi-Newton method generates a sequence of positive-definite 
matrices {B k } from a sequence of vectors {y k } and {s k } defined as 

Vk = V/(x fc+ i) - f(x k ) and s k = x x+1 - x k , 

respectively. The L-BFGS quasi-Newton method can be viewed as the BFGS 
quasi-Newton method where only at most M recently computed updates are 
stored and used to update the initial matrix Bq. Here M is a positive constant 
with M < n. The L-BFGS quasi-Newton approximation to the Hessian of 
/ is implicitly updated as follows: 

fe-i fc-i 

B k = b - ^ T + Yl bi %i ( 10 ) 



where 



i=0 i=0 



B f l , bi — — J^=, B = 'j k 1 I, (II) 



and 7^ > is a constant. In practice, ■j k is often taken to be ^ k = x 2/^ — i / 1 1 — i 1 1 i 



19 or m 



(see, e.g. 

Suppose that we have computed k updates (k < M — 1) and have the 
following updates stored in S and Y: 

S = [s . . . s fc _i] and Y = [y . . . y k -i] . 

We update S and Y with the most recently computed vector pair (s k ,y k ) as 
follows: 
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Algorithm 3.1: Update S and Y. 

if k < M- 1, 

5 <-[S Sfc ]; y k \- k^k + 1; 

else 

for z — 0, . . . k — 1 

Sj <- s i+ x; yi <- y i+1 ; 

end 

S 4- [s Q , . . .s fc _i]; F ■<— [y , • • - Vk-i]] 
end 



Thus, at all times we have exactly k stored vectors with k < M — 1. 

For the L-BFGS method, there is an efficient recursion relation to com 
pute products with B^ 1 . Given a vector z, the following algorithm 25|, |26 



terminates with r = B 



A D-l. 



Algorithm 3.2: Two-loop recursion to compute r = B k l z. 

for z = fc — 1 , . . . , 

«i ( s Iq)/(yT s i); 

q <- q- aaVi] 
end 

r <- S- 1 ?; 

for i = 0, . . . , k — 1 

P <- (yfr)/(yfs l ); 

r <- r + (oti- (3)sf. 
end 



Pre-computing and storing 1/yfsi for < i < k — 1, makes Algorith m 13.21 
even more efficient. Further details on the L-BFGS method can found in [26[; 
further background on the BFGS can be found in j5|. The two-term recur- 
sion formula requires at most OiMn) multiplications and additions. There 
is compact matrix representation of the L-BFGS that can be used to com- 
pute products with the L-BFGS quasi-Newton matrix (see, e.g., [26j]). The 
computational complexity at most OiMn) multiplications. 
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There is an alternative representation of B k 1 from which the two-term 
recursion can be more easily understood: 

^^(Ci-^rt-'-U + -^(V k T _ 1 ---V 1 T )s sUV 1 ---V k _ 1 ) 

Uq s o 

+ -^{Vl 1 ---V?)s l s T 1 {V 2 ---V k . 1 ) 
+ ■■■ 

+ — Sk-xsZ-!, (12) 



where Vi = I — -^-yisj (see, e.g., 26]). The first loop in the two-term 



recursion for B k l z computes and stores the products (Vj ■ ■ ■ V k -\)z for j = 
0, . . . , k — 1; in between the first and second loop, B^ 1 is applied; and finally, 
the second loop computes the remainder of (TT2l) . Computing the inverse of 
B k + al is not equivalent to simply replacing B^ 1 in the two-loop recursion 
with (B + al)~ l . To see this, notice that replacing Bq 1 in ( fl2|l with (B + 
ul)^ 1 would apply the updates Vi to the full quantity (B + crl)^ 1 instead 
of only Bq 1 . The main contribution of this paper is a recursion formula 
that computes (B k + al)~ 1 z in an efficient manner using only vector inner 
products. 



4. The recursion formula 

Consider the problem of finding the inverse of B + a I, where B is an 
L-BFGS quasi-Newton matrix. The Sherman-Morrison- Woodbury (SMW) 
formula gives the following formula for computing the inverse of A + UV T , 
assuming A is invert ible (see (l7j|): 

(A + UV T Y l = A" 1 - A^U{I + V T A~ 1 U)~ 1 V T A' 1 . 

In the special case when UV T is a rank-one update to A, this formula becomes 

(A + uv T )- 1 = A- 1 - A-\{I + v T A- 1 u)- 1 v T A-\ 

where u and v are both n-vectors. For simplicity, first consider computing 
the inverse of an L-BFGS quasi-Newton matrix after only one update, i.e., 
the inverse of B\ + al. Recall that 

B x + al = ( 7l - x + a)I - a a% + b b% . 
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To compute the inverse of this, we apply SMW twice. To see this clearly, let 

Co = (li 1 + <r)I, C 1 = (7^ + a) I - a oa T, C 2 = (7^ + a)I - a a^ + b b^ . 

Applying SMW to C\ yields 

^ = C^ + C^a a {l-alC^a Q )- l alC^ (13) 

1 

(1 - a^C^ao) 
Applying SMW once more we obtain C^ 1 from C-f : 



C o' + ~ T^-^C^aoa^Co 1 . (14) 



giving an expression for {B\ + cr/)" 1 . This is the basis for the following 
recursion method that appears in 20 



Theorem 2. Let G and G + H be nonsingular matrices and let H have 
positive rank M . Let H = Eq + E\ + • • ■ + Em-i where each Ek has rank one 
and Ck+i = G + E + ■ ■ ■ + E k is nonsingular for i = 0, . . . M — 1 . Then if 
C = G 

C^^C^-v^EuC^, fc = 0,...,M-l, (15) 

where 

Vk = 1 + trace (C^E k ) ' (16) 

In particular, 

(G + H )~ = C A/ 1 _ 1 - wa/-iC , a/-i- e a/-iC A / 1 _i- (17) 



Proof. See (20|. □ 

We now show that applying the above recursion method to B k + al, the 
product (B k + al)~ 1 z can be computed recursively, assuming ^ k a is bounded 
away from zero. 

Theorem 3. Let j k > and a > with ji-a > e for some e > 0. Let 
G = Bq + a I = (y^ 1 + a) I, and let H = Y^a=o 1 where 

Eq = — a aQ , Ei = boby , . . . , E 2k -2 = —a k ^ia[_i, E 2k -i = 

Then (B k + cl)~ l = (G + if) -1 is given by ([77}) together with (T7J 
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Proof. Notice that this theorem follows from Theorem [2J provided we satisfy 
its assumptions. By construction, + o~I = G + H. Both By. and By + ol 
are nonsingular since Bk is positive definite and a > 0. It remains to show 
that Cj, which is given by 

3-1 3-i 
°3 = G + J2 Ei = + ^2 E i) + aI > 

i=0 i=0 

is nonsingular for j — 1, . . . k — 1, for which we use induction on j. 

Since Ci = C — an^o — Co(I ~~ Co~ la o a o )> the determinant of Ci and C 
are related as follows 7J: 

det(Ci) = det(C )(l - a^C^a ). 

In other words, C\ is invertible if Co is invertible and aoC^ao ^ 1. We 
already established that Co is invertible; to show the latter condition, we use 
the definition of ao = BqSq/ a/ s^BqSq together with Cq -1 = (7 / 7 1 + cr)" 1 / to 
obtain the following: 

r r -i n 7 fc " 2 (7 fc " 1 + ^)" 1 

Ik s o s o 

1 



1k{l k 1 4 
1 

1 + IkO- ' 



(18) 



By hypothesis, 7^0" > e, which implies that det(Ci) 7^ 0; thus, C\ is invertible. 

Now we assume that Cj is invertible and show that Cj + i is invertible. If 
j is odd, then j + 1 = 2i for some % and Cj+i = Bi + o~I, which is positive 
definite and therefore nonsingular. If j is even, i.e., j = 2i for some i, then 
Cj = Bi + a I , and 

Cj+i = — ajOj = Bi =— — BiSiSi B i + o~I 

Si BiSi 

We will demonstrate that Cj + i is nonsingular by showing that it is positive 



10 



definite. Consider z G ffi n with z 7^ 0. Then 

z T C j+1 z = z t (b,i ^r- — BiSisfB? jz + cr\\z\\l 

T D { zT BiSi) 2 2 
= Z BiZ ^t-t h cr 2 2 

,1/2 xT/,,1/2,, x\2 



|V2„ 112 



I TJ 1 A 112 \\ J / \ x 1 112 



= H^ 2 ,^ - IIB/^llico^MB^z.B/^)) + 

> 0. (19) 

We have now satisfied all the assumptions of Theorem [21 Therefore, (Bj + 
is given by (fTTj) together with f|T5|) . □ 

Now, we show r = C^hz in ( IT5|) can be computed efficiently using recur- 
sion. We note that using ( IT5|) . we have 



cri^ = cz x z-v h cz x E h cz x z 



C k 1 z + WfcC fc l aka\C k z if is even 



2 2 



fc+1 k w k ~*~ k « - ^c-h-v.C^b^b-UC-'z if fc is odd 

The quantity is obtained using (fT6|) and computing trace(C i T 1 £'fc), which 
after substituting in the definition of E k and computing the trace, is given 
by 

-o^/ 2 C k ~ 1 a k /2 if k is even 

hT (k-i)/2 C k\k-i)/2 if fc is odd 
If we define p k according to the following rules 



trace (C k l E k 



then 



Vk = { C \ ak " if k iS 6Ven (20) 
C k 6( fc _i)/2 if k is odd, 



if k is even 



Vk = s l-pla k/2 

7p- if k is odd, 

l+Pfc&(*-l)/2 

11 



and thus, 

= C^z+(-l) k v k (p T k z) Pk . 
Applying this recursively yields the following formula: 

k 

C k \ x z = C - 1 z + ^(-l)^(^) ft , (21) 

i=0 

for k > and with C^z = (7 A T 1 + a)~ l z. 

Finally, it remains to demonstrate how to compute p k in fTSOj) . Notice 
that we can compute p k using fT2"Tj) : 

fc-i 

C^a k / 2 = C^ l a k /2 + y^(-l)^(pfa fc / 2 )Pi if k is even 

Pfc=S i= Vi 

C^b^i)^ = C^b^yz + ^(-l) J fi(pf6(fe-i)/ 2 )Pi if k is odd. 

The following pseudocode summarizes the algorithm for computing r = 
C k +i z: 



Algorithm 4.1: Proposed recursion to compute r = C^iz. 

for j = 0, . . . , k 
if j even 
c <- a i/2 ; 

else 

c <- 6 (j-i)/2; 

end 

Pj (7fe+i + ^) -1 c; 

for z = 0, . . . ,j — 1 

Pi + {- l Y v i{pJc)pi] 

end 

^.^l/(l + (-l)ipJ C ); 

r <r- r + (-l^fjofz)^; 
end 



Algorithm 14.11 requires 0(k 2 ) vector inner products. Operations with 
Cq and C\ can be hard-coded since Co is a scalar-multiple of the identity. 
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Experience has shown that k may be kept small (for example, Byrd et al. |2| 
suggest k G [3, 7]), making the extra storage requirements and computations 
affordable. 

5. Numerical Experiments 

We demonstrate the effectiveness of the proposed recursion formula by 
solving linear systems of the form fllUp with various sizes. Specifically, we let 
the number of updates M = 5 and the size of the matrix range from n = 10 3 
up to 10 7 . We implemented the proposed method in Matlab on a Two 2.4 
GHz Quad-Core Intel Xeon "Westmere" Apple Mac Pro and compared it 
to a direct method using the Matlab "backslash" command and the built-in 
conjugate-gradient (CG) method (pcg.m). Because of limitations in memory, 
we were only able to use the direct method for problems where n < 20, 000. 
In the tables below, we show the time and the relative residuals for each 
method. The relative residuals for the recursion formula are used as the 
criteria for convergence for CG. In other words, the time reported in this 
table reflects how long it takes for CG to achieve the same accuracy as the 
proposed recursion method, which is why the CG relative residuals are always 
less than those for the proposed recursion formula (except for one instance 
where the CG method stagnated.) 

Analysis. The three methods were run on numerous problems with vari- 
ous problems sizes, and we note that all methods achieve very small relative 
residual errors for each of the problems we considered. Besides from mem- 
ory issues, the direct method suffers from significantly longer computational 
time, especially for the larger problems. Generally, the recursion algorithm 
takes about one-fourth the amount of time as the CG method. The CG 
method requires 2M + 2 vector-vector products per iteration (2M for the 
matrix-vector product and 2 for other vector-vector products) and in exact 
arithmetic will converge in 2M + 1 iterations (because the matrix Bm in ( TTUj) 
is the sum of a scaled multiple of the identity with 2M rank-1 matrices, which 
means that Bm has at most 2M + 1 unique eigenvalues). However, from our 
computational experience, the number of iterations is closer to AM, which 
brings the total vector-vector multiplications for CG to around 8M 2 . Mean- 
while, the number of vector-vector multiplications for the recursion formula 
in Algorithm O is (2M + 1)(2M + 2)/2 = 2M 2 + 3M + 1, which explains 
why the CG algorithm takes roughly four times as long to achieve the same 
accuracy as the the proposed recursion algorithm. 
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n 




Direct 


Relat 


ive Residual 
CG 


Recursion 


1,000 


2 


.84e-14 


2 


.55e-14 


3.62e-14 


2,000 


6 


.59e-14 


5 


57e-14 


2.95e-13 


5,000 


1 


.02e-13 


7 


.83e-14 


8.83e-14 


10,000 


1 


.39e-13 


1 


.09e-13 


l.lle-13 


20,000 


2 


.63e-14 


2 


. 10e-14 


2.14e-14 



Table 1: A sample run comparing the relative residuals of the solutions using the Matlab 
"backslash" command, conjugate gradient method, and the proposed recursion formula. 
The relative residuals for the recursion formula are used as the criteria for convergence for 
CG. 



n 


Direct 


Time (sec) 
CG 


Recursion 


1,000 


0.0311 


0.0078 


0.0015 


2,000 


0.2068 


0.0099 


0.0019 


5,000 


1.3692 


0.0211 


0.0048 


10,000 


8.0280 


0.0306 


0.0083 


20,000 


51.7772 


0.0862 


0.0160 



Table 2: The computational times to achieve the results in Table [TJ 



n 




Relative 


residual 


Time 


(sec) 




CG 


Recursion 


CG 


Recursion 


100,000 


8 


71e-14 


1.50e-13 


0.2339 


. 0584 


200, 000 


1 


47e-14 


3.27e-14 


. 4686 


0.1155 


500,000 


4 


90e-14* 


3.55e-14 


1.6996 


0.3587 


1,000,000 


9 


99e-15 


1.03e-14 


6.0068 


1.0914 


2,000,000 


1 


10e-13 


6.54e-13 


15.9798 


2.5653 


5,000,000 


3 


08e-14 


4.84e-14 


30.2865 


6.2738 


10,000,000 


1 


33e-14 


3.97e-14 


67 . 3049 


11.5946 



Table 3: Comparison between the proposed recursion method and the built-in conjugate 
gradient (CG) method in Matlab. The relative residuals for the recursion formula are 
used as the criteria for convergence for CG. In other words, the time reported in this table 
reflects how long it takes for CG to achieve the same accuracy as the proposed recursion 
method. *In this case, CG terminated without converging to the desired tolerance because 
"the method stagnated." 
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6. Extensions 

The proposed recursion algorithm also computes products of the form 
(Bk + D)~ 1 z, where D is any positive-definite diagonal matrix. In this case, 
we must assume that each diagonal entry in D satisfies da > a for some 
a > 0. Provided 7^0" > e, a theorem similar to Theorem [3] will hold true 
mutatis mutandis: the only steps in the proof that need changing are ffl8|) , 
which becomes 

-2 1 n 1 1 

«o Co'ao = -^V4(^o + I?)" 1 ^ = —fr- £ "I— r( s o) • < 7- , 

7 fc ^0 s 7fcSo s ^ 7 fc + du 1 + 7fc^ 

and the cascading equations in (fl9]) . whose terms become z T Dz, which 

is greater than or equal to o"||^||l- Additionally, the recursion formula for 
diagonal updates need not be limited to L-BFGS systems. In particular, it 
is also applicable to other quasi-Newton systems where a recursion formula 
exists and the quasi-Newton matrices are guaranteed to be positive definite. 
For example, the proposed recursion will work with quasi-Newton matrices 
using the DFP updating formula, but it will not work with quasi-Newton 
matrices based on SRI, which are not guaranteed to be positive definite (for 
more information on the DFP and SRI method see, for example, [26|). 

7. Concluding remarks 

In this paper, we proposed an algorithm based on the SMW formula to 
solve systems of the form B + a I, where B is an n x n L-BFGS quasi-Newton 
matrix. We showed that as long as 7cr > e, the algorithm is well-defined. 
The algorithm requires at most M 2 vector inner products. (Note: We assume 
that M <^ n, and thus, M 2 is also significantly smaller than n.) While the 
algorithm is designed to handle constant diagonal updates of a quasi-Newton 
matrix, it can be extended to handle general diagonal updates of a quasi- 
Newton matrix. Furthermore, this algorithm can be extended to handle 
any quasi-Newton updating that ensures the quasi-Newton matrices are all 
positive definite. The algorithm proposed in this paper can be found at 
http : / /www . wf u . edu/~erwayjb/sof tware. 
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